Segmentation of DNA sequences using Normalized Maximum Likelihood models for uncovering gene duplications

نویسنده

  • Ioan Tăbuş
چکیده

The normalized maximum likelihood (NML) model [2]-[4] for a class of Markov sources [6] was recently used for the compression of full genomes, obtaining for the human genome the best existing compression results [1]. We show that one of the underlying biological features that the compression algorithm implicitly uncovers is the existence of approximate gene duplication. We proposed a refined method based on the same NML models for the segmentation of DNA sequences for uncovering gene duplications [5]. Several analysis tasks in genomic sequences involve preliminary segmentation or clustering of the data, which can be performed by a number of techniques, based on various similarity measures. Here we review and further pursue the application of MDL techniques for genomic sequence analysis. The process of sequence matching will be used for solving the problem of uncovering gene duplications with the help of a preliminary segmentation of a complex DNA locus, known to have evolved through a series of duplications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Normalized Maximum Likelihood Models with Memory for Genomics

This paper will review the recent applications of MDL techniques for genomic sequence analysis presented in detail in [12] and [13]. The normalized maximum likelihood (NML) model [5]-[8] for a class of Markov sources [13] was recently used for the compression of full genomes, obtaining for the human genome the best existing compression results [4]. We have shown that one of the underlying biolo...

متن کامل

An Evolutionary and Phylogenetic Study of the BMP15 Gene

DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...

متن کامل

Study on phylogenetic status of Hari barbel Luciobarbus conocephalus (Kessler, 1872) from Hari river using Cytb gene

Recently, Luciobarbus conocephalus from the Hari River was reported for the first time, but there is doubt about the validity of this species between authors, because some of them placed it as a subspecies or synonym of L. capito. Therefore, the present study was conducted to investigate the status of phylogeny and the validity of this species. For this purpose, specimens captured from Hari Riv...

متن کامل

فیلوژنی مولکولی جنس Eumeces Wiegmann, 1834 (خزندگان: سینسیده) در ایران، براساس DNA میتوکندریایی ژن 16S

Phylogenetic relationships among the Eumeces schneiderii princeps and Eumeces schneiderii pavimentatus investigated using 509 bp partial sequences of 16S mitochondrial gene. Analyses were done by maximum-likelihood (RAxML) criteria on 52 specimens from over 20 geographically distinct localities. Our molecular results proposed two well-supported major clades by their phylogenetic positions, gene...

متن کامل

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008